--- title: 01b02 RNA Velocity Data keywords: fastai sidebar: home_sidebar nb_path: "nbs/01 Datasets/01b02 Single-Cell Data and RNA Velocity.ipynb" ---
{% raw %}
{% endraw %}

RNA velocity is a high-dimensional vector that predicts the future state of individual cells on a timescale of hours. Authors of RNA Velocity expect it to greatly aid the analysis of developmental lineages and cellular dynamics, particularly in humans.

In this notebook, we have single-cell datasets with RNA velocity information. Functions return 2 $n \times d$ torch tensors and a list. The first tensor contains information of $n$ cells with $d$ features and the second tensor holds the vectors associated with the aforementioned $n$ cells in the same $d$ directions. The list contains the ground truth cell type labels of each of the $n$ cells.

scvelo Dataset Loaders

Call one of the following and input returned object into rnavelo or rnavelo_pcs to get data, flows, labels (and n_pcs) in processed form.

  • scvelo.datasets.pancreas()
  • scvelo.datasets.bonemarrow()
  • scvelo.datasets.dentategyrus()
  • scvelo.datasets.dentategyrus_lamanno()
  • scvelo.datasets.gastrulation_e75()
  • scvelo.datasets.gastrulation_erythroid()
  • scvelo.datasets.forebrain()
  • scvelo.datasets.gastrulation()
  • scvelo.datasets.pbmc68k()
  • scvelo.datasets.simulation(n_obs=300, n_vars=None, alpha=None, beta=None, gamma=None, alpha_=None, t_max=None, noise_model='normal', noise_level=1, switches=None, random_seed=0)
{% raw %}

rnavelo_find_cluster_key[source]

rnavelo_find_cluster_key(adata)

{% endraw %} {% raw %}

rnavelo_add_labels[source]

rnavelo_add_labels(adata)

{% endraw %} {% raw %}

rnavelo_preprocess[source]

rnavelo_preprocess(adata)

{% endraw %} {% raw %}

rnavelo[source]

rnavelo(adata)

{% endraw %} {% raw %}

rnavelo_pcs[source]

rnavelo_pcs(adata)

{% endraw %} {% raw %}

rnavelo_plot_pca[source]

rnavelo_plot_pca(adata, ax=None, show=True)

{% endraw %} {% raw %}
{% endraw %}

Examples

Bone Marrow Dataset

{% raw %}
adata = scv.datasets.bonemarrow()
rnavelo_plot_pca(adata)
Normalized count data: X, spliced, unspliced.
Logarithmized X.
computing neighbors
    finished (0:00:00) --> added 
    'distances' and 'connectivities', weighted adjacency matrices (adata.obsp)
computing moments based on connectivities
    finished (0:00:02) --> added 
    'Ms' and 'Mu', moments of un/spliced abundances (adata.layers)
computing velocities
    finished (0:00:04) --> added 
    'velocity', velocity vectors for each individual cell (adata.layers)
computing velocity graph (using 1/10 cores)
    finished (0:00:25) --> added 
    'velocity_graph', sparse matrix with cosine correlations (adata.uns)
computing velocity embedding
    finished (0:00:01) --> added
    'velocity_pca', embedded velocity vectors (adata.obsm)
2022-11-18T08:20:52.779069 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
{% endraw %}

Pancreas Datasets

{% raw %}
adata = scv.datasets.pancreas()
X, flows, labels = rnavelo(adata)
Normalized count data: X, spliced, unspliced.
Logarithmized X.
computing neighbors
    finished (0:00:00) --> added 
    'distances' and 'connectivities', weighted adjacency matrices (adata.obsp)
computing moments based on connectivities
    finished (0:00:02) --> added 
    'Ms' and 'Mu', moments of un/spliced abundances (adata.layers)
computing velocities
    finished (0:00:07) --> added 
    'velocity', velocity vectors for each individual cell (adata.layers)
{% endraw %} {% raw %}
X2, flows2, labels2, n_pcs = rnavelo_pcs(adata)
WARNING: Did not normalize X as it looks processed already. To enforce normalization, set `enforce=True`.
WARNING: Did not normalize spliced as it looks processed already. To enforce normalization, set `enforce=True`.
WARNING: Did not normalize unspliced as it looks processed already. To enforce normalization, set `enforce=True`.
WARNING: Did not modify X as it looks preprocessed already.
computing moments based on connectivities
    finished (0:00:02) --> added 
    'Ms' and 'Mu', moments of un/spliced abundances (adata.layers)
computing velocities
    finished (0:00:07) --> added 
    'velocity', velocity vectors for each individual cell (adata.layers)
computing velocity graph (using 1/10 cores)
---------------------------------------------------------------------------
KeyboardInterrupt                         Traceback (most recent call last)
/Users/adjourner/Projects/FRED/01b02 Single-Cell Data and RNA Velocity.ipynb Cell 12 in <cell line: 1>()
----> <a href='vscode-notebook-cell:/Users/adjourner/Projects/FRED/01b02%20Single-Cell%20Data%20and%20RNA%20Velocity.ipynb#X14sZmlsZQ%3D%3D?line=0'>1</a> X2, flows2, labels2, n_pcs = rnavelo_pcs(adata)

/Users/adjourner/Projects/FRED/01b02 Single-Cell Data and RNA Velocity.ipynb Cell 12 in rnavelo_pcs(adata)
     <a href='vscode-notebook-cell:/Users/adjourner/Projects/FRED/01b02%20Single-Cell%20Data%20and%20RNA%20Velocity.ipynb#X14sZmlsZQ%3D%3D?line=47'>48</a>     scv.pp.pca(adata)
     <a href='vscode-notebook-cell:/Users/adjourner/Projects/FRED/01b02%20Single-Cell%20Data%20and%20RNA%20Velocity.ipynb#X14sZmlsZQ%3D%3D?line=49'>50</a> # calculate velocity pca and display pca plot (2 dimensions)
---> <a href='vscode-notebook-cell:/Users/adjourner/Projects/FRED/01b02%20Single-Cell%20Data%20and%20RNA%20Velocity.ipynb#X14sZmlsZQ%3D%3D?line=50'>51</a> scv.tl.velocity_graph(adata)
     <a href='vscode-notebook-cell:/Users/adjourner/Projects/FRED/01b02%20Single-Cell%20Data%20and%20RNA%20Velocity.ipynb#X14sZmlsZQ%3D%3D?line=51'>52</a> scv.tl.velocity_embedding(adata, basis='pca', direct_pca_projection=False)
     <a href='vscode-notebook-cell:/Users/adjourner/Projects/FRED/01b02%20Single-Cell%20Data%20and%20RNA%20Velocity.ipynb#X14sZmlsZQ%3D%3D?line=53'>54</a> X = torch.tensor(adata.obsm["X_pca"].copy())

File ~/miniforge3/envs/FRED/lib/python3.9/site-packages/scvelo/tools/velocity_graph.py:363, in velocity_graph(data, vkey, xkey, tkey, basis, n_neighbors, n_recurse_neighbors, random_neighbors_at_max, sqrt_transform, variance_stabilization, gene_subset, compute_uncertainties, approx, mode_neighbors, copy, n_jobs, backend)
    359 n_jobs = get_n_jobs(n_jobs=n_jobs)
    360 logg.info(
    361     f"computing velocity graph (using {n_jobs}/{os.cpu_count()} cores)", r=True
    362 )
--> 363 vgraph.compute_cosines(n_jobs=n_jobs, backend=backend)
    365 adata.uns[f"{vkey}_graph"] = vgraph.graph
    366 adata.uns[f"{vkey}_graph_neg"] = vgraph.graph_neg

File ~/miniforge3/envs/FRED/lib/python3.9/site-packages/scvelo/tools/velocity_graph.py:175, in VelocityGraph.compute_cosines(self, n_jobs, backend)
    172 n_obs = self.X.shape[0]
    174 # TODO: Use batches and vectorize calculation of dX in self._calculate_cosines
--> 175 res = parallelize(
    176     self._compute_cosines,
    177     range(self.X.shape[0]),
    178     n_jobs=n_jobs,
    179     unit="cells",
    180     backend=backend,
    181 )()
    182 uncertainties, vals, rows, cols = map(_flatten, zip(*res))
    184 vals = np.hstack(vals)

File ~/miniforge3/envs/FRED/lib/python3.9/site-packages/scvelo/core/_parallelize.py:128, in parallelize.<locals>.wrapper(*args, **kwargs)
    125 else:
    126     pbar, queue, thread = None, None, None
--> 128 res = Parallel(n_jobs=n_jobs, backend=backend)(
    129     delayed(callback)(
    130         *((i, cs) if use_ixs else (cs,)),
    131         *args,
    132         **kwargs,
    133         queue=queue,
    134     )
    135     for i, cs in enumerate(collections)
    136 )
    138 res = np.array(res) if as_array else res
    139 if thread is not None:

File ~/miniforge3/envs/FRED/lib/python3.9/site-packages/joblib/parallel.py:1043, in Parallel.__call__(self, iterable)
   1034 try:
   1035     # Only set self._iterating to True if at least a batch
   1036     # was dispatched. In particular this covers the edge
   (...)
   1040     # was very quick and its callback already dispatched all the
   1041     # remaining jobs.
   1042     self._iterating = False
-> 1043     if self.dispatch_one_batch(iterator):
   1044         self._iterating = self._original_iterator is not None
   1046     while self.dispatch_one_batch(iterator):

File ~/miniforge3/envs/FRED/lib/python3.9/site-packages/joblib/parallel.py:861, in Parallel.dispatch_one_batch(self, iterator)
    859     return False
    860 else:
--> 861     self._dispatch(tasks)
    862     return True

File ~/miniforge3/envs/FRED/lib/python3.9/site-packages/joblib/parallel.py:779, in Parallel._dispatch(self, batch)
    777 with self._lock:
    778     job_idx = len(self._jobs)
--> 779     job = self._backend.apply_async(batch, callback=cb)
    780     # A job can complete so quickly than its callback is
    781     # called before we get here, causing self._jobs to
    782     # grow. To ensure correct results ordering, .insert is
    783     # used (rather than .append) in the following line
    784     self._jobs.insert(job_idx, job)

File ~/miniforge3/envs/FRED/lib/python3.9/site-packages/joblib/_parallel_backends.py:208, in SequentialBackend.apply_async(self, func, callback)
    206 def apply_async(self, func, callback=None):
    207     """Schedule a func to be run"""
--> 208     result = ImmediateResult(func)
    209     if callback:
    210         callback(result)

File ~/miniforge3/envs/FRED/lib/python3.9/site-packages/joblib/_parallel_backends.py:572, in ImmediateResult.__init__(self, batch)
    569 def __init__(self, batch):
    570     # Don't delay the application, to avoid keeping the input
    571     # arguments in memory
--> 572     self.results = batch()

File ~/miniforge3/envs/FRED/lib/python3.9/site-packages/joblib/parallel.py:262, in BatchedCalls.__call__(self)
    258 def __call__(self):
    259     # Set the default nested backend to self._backend but do not set the
    260     # change the default number of processes to -1
    261     with parallel_backend(self._backend, n_jobs=self._n_jobs):
--> 262         return [func(*args, **kwargs)
    263                 for func, args, kwargs in self.items]

File ~/miniforge3/envs/FRED/lib/python3.9/site-packages/joblib/parallel.py:262, in <listcomp>(.0)
    258 def __call__(self):
    259     # Set the default nested backend to self._backend but do not set the
    260     # change the default number of processes to -1
    261     with parallel_backend(self._backend, n_jobs=self._n_jobs):
--> 262         return [func(*args, **kwargs)
    263                 for func, args, kwargs in self.items]

File ~/miniforge3/envs/FRED/lib/python3.9/site-packages/scvelo/tools/velocity_graph.py:235, in VelocityGraph._compute_cosines(self, obs_idx, queue)
    230         uncertainties.extend(
    231             np.nansum(dX ** 2 * moments[obs_id][None, :], 1)
    232         )
    234     vals.extend(val)
--> 235     rows.extend(np.ones(len(neighs_idx)) * obs_id)
    236     cols.extend(neighs_idx)
    238 if queue is not None:

File ~/miniforge3/envs/FRED/lib/python3.9/site-packages/numpy/core/numeric.py:204, in ones(shape, dtype, order, like)
    201 if like is not None:
    202     return _ones_with_like(shape, dtype=dtype, order=order, like=like)
--> 204 a = empty(shape, dtype, order)
    205 multiarray.copyto(a, 1, casting='unsafe')
    206 return a

KeyboardInterrupt: 
{% endraw %} {% raw %}
print(X.shape)
print(flows.shape)
print(labels.shape)

print(X2.shape)
print(flows2.shape)
print(labels2.shape)
print(n_pcs)
torch.Size([3696, 27998])
torch.Size([3696, 27998])
torch.Size([3696])
torch.Size([3696, 50])
torch.Size([3696, 50])
torch.Size([3696])
50
{% endraw %} {% raw %}
rnavelo_plot_pca(adata)
{% endraw %}

Dentate Gyrus

{% raw %}
adata = scv.datasets.dentategyrus()
X, flows, labels = rnavelo(adata)
scv.tl.velocity_graph(adata)
{% endraw %} {% raw %}
X, flows, labels, n_pcs = rnavelo_pcs(adata)
WARNING: Did not normalize X as it looks processed already. To enforce normalization, set `enforce=True`.
WARNING: Did not normalize spliced as it looks processed already. To enforce normalization, set `enforce=True`.
WARNING: Did not normalize unspliced as it looks processed already. To enforce normalization, set `enforce=True`.
WARNING: Did not modify X as it looks preprocessed already.
computing moments based on connectivities
    finished (0:00:02) --> added 
    'Ms' and 'Mu', moments of un/spliced abundances (adata.layers)
computing velocities
    finished (0:00:05) --> added 
    'velocity', velocity vectors for each individual cell (adata.layers)
computing velocity graph (using 1/8 cores)
    finished (0:00:30) --> added 
    'velocity_graph', sparse matrix with cosine correlations (adata.uns)
computing velocity embedding
    finished (0:00:00) --> added
    'velocity_pca', embedded velocity vectors (adata.obsm)
{% endraw %} {% raw %}
rnavelo_plot_pca(adata)
{% endraw %}

UMAP Plots by scVelo

Pancreas

The following cell gets the dataset, preprocesses it (including computing velocities), computes a transition probability matrix (the velocity graph), and then uses this to compute a low-dimensional embedding (using UMAP).

{% raw %}
adata = scv.datasets.pancreas()
rnavelo_preprocess(adata)
# calculate velocity pca and display pca plot (2 dimensions)
scv.tl.velocity_graph(adata)
scv.tl.velocity_embedding(adata, basis='umap')
Normalized count data: X, spliced, unspliced.
Logarithmized X.
computing neighbors
    finished (0:00:00) --> added 
    'distances' and 'connectivities', weighted adjacency matrices (adata.obsp)
computing moments based on connectivities
    finished (0:00:02) --> added 
    'Ms' and 'Mu', moments of un/spliced abundances (adata.layers)
computing velocities
    finished (0:00:07) --> added 
    'velocity', velocity vectors for each individual cell (adata.layers)
computing velocity graph (using 1/10 cores)
    finished (0:00:21) --> added 
    'velocity_graph', sparse matrix with cosine correlations (adata.uns)
computing velocity embedding
    finished (0:00:00) --> added
    'velocity_umap', embedded velocity vectors (adata.obsm)
{% endraw %}

Here's the annotated UMAP embedding. The points are embedded using UMAP, and the arrows are fit onto this with a simple heuristic.

{% raw %}
scv.pl.velocity_embedding_stream(adata, basis='umap')
2022-11-18T09:06:13.234476 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
{% endraw %}

We want to extract the points and use them with our plotting functions, for consistency. Let's see the available keys:

{% raw %}
adata.obsm
AxisArrays with keys: X_pca, X_umap, velocity_umap
{% endraw %} {% raw %}
X = torch.tensor(adata.obsm["X_umap"].copy())
flows = torch.tensor(adata.obsm["velocity_umap"].copy())
labels = rnavelo_add_labels(adata)
{% endraw %} {% raw %}
from FRED.datasets import plot_directed_2d
plot_directed_2d(X,flows,labels, minimal=True)
2022-11-18T09:11:33.598031 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
{% endraw %}

Bone Marrow

Repeating this process with the bone marrow dataset... For reasons I have yet to diagnose, scvelo automatically computes the UMAP embedding of the pancreas, but opts to compute the tSNE embedding of the bone marrow.

{% raw %}
adata = scv.datasets.bonemarrow()
rnavelo_preprocess(adata)
# calculate velocity and display plot (2 dimensions)
scv.tl.velocity_graph(adata)
Normalized count data: X, spliced, unspliced.
Logarithmized X.
computing neighbors
    finished (0:00:00) --> added 
    'distances' and 'connectivities', weighted adjacency matrices (adata.obsp)
computing moments based on connectivities
    finished (0:00:02) --> added 
    'Ms' and 'Mu', moments of un/spliced abundances (adata.layers)
computing velocities
    finished (0:00:04) --> added 
    'velocity', velocity vectors for each individual cell (adata.layers)
computing velocity graph (using 1/10 cores)
    finished (0:00:22) --> added 
    'velocity_graph', sparse matrix with cosine correlations (adata.uns)
{% endraw %}

We can see which embedding it computed by running:

{% raw %}
adata.obsm
AxisArrays with keys: X_pca, X_umap, velocity_umap
{% endraw %} {% raw %}
scv.tl.velocity_embedding(adata, basis='tsne')
computing velocity embedding
    finished (0:00:00) --> added
    'velocity_tsne', embedded velocity vectors (adata.obsm)
{% endraw %} {% raw %}
scv.pl.velocity_embedding_stream(adata, basis='tsne')
2022-11-18T09:23:53.894105 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
{% endraw %}

We want to extract the points and use them with our plotting functions, for consistency. Let's see the available keys:

{% raw %}
X = torch.tensor(adata.obsm["X_tsne"].copy())
flows = torch.tensor(adata.obsm["velocity_tsne"].copy())
labels = rnavelo_add_labels(adata)
{% endraw %} {% raw %}
from FRED.datasets import plot_directed_2d
plot_directed_2d(X,flows,labels, minimal=True)
2022-11-18T09:24:40.587104 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
{% endraw %}

Dentate Gyrus

{% raw %}
adata = scv.datasets.dentategyrus()
rnavelo_preprocess(adata)
# calculate velocity and display plot (2 dimensions)
scv.tl.velocity_graph(adata)
Normalized count data: X, spliced, unspliced.
Logarithmized X.
computing neighbors
    finished (0:00:00) --> added 
    'distances' and 'connectivities', weighted adjacency matrices (adata.obsp)
computing moments based on connectivities
    finished (0:00:01) --> added 
    'Ms' and 'Mu', moments of un/spliced abundances (adata.layers)
computing velocities
    finished (0:00:02) --> added 
    'velocity', velocity vectors for each individual cell (adata.layers)
computing velocity graph (using 1/10 cores)
    finished (0:00:09) --> added 
    'velocity_graph', sparse matrix with cosine correlations (adata.uns)
{% endraw %}

We can see which embedding it computed by running:

{% raw %}
adata.obsm
AxisArrays with keys: X_umap, X_pca
{% endraw %} {% raw %}
scv.tl.velocity_embedding(adata, basis='umap')
computing velocity embedding
    finished (0:00:00) --> added
    'velocity_umap', embedded velocity vectors (adata.obsm)
{% endraw %} {% raw %}
scv.pl.velocity_embedding_stream(adata, basis='umap')
2022-11-18T09:28:10.339946 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
{% endraw %}

We want to extract the points and use them with our plotting functions, for consistency. Let's see the available keys:

{% raw %}
X = torch.tensor(adata.obsm["X_umap"].copy())
flows = torch.tensor(adata.obsm["velocity_umap"].copy())
labels = rnavelo_add_labels(adata)
{% endraw %} {% raw %}
from FRED.datasets import plot_directed_2d
plot_directed_2d(X,flows,labels, minimal=True)
2022-11-18T09:28:22.889316 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
{% endraw %}